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Abstract — In this paper, we derive the explicit series expansion 
of the eigenvalue distribution of various models, namely the case 
of non-central Wishart distributions, as well as correlated zero 
mean Wishart distributions. The tools used extend those of the 
free probability framework, which have been quite successful 
for high dimensional statistical inference (when the size of the 
matrices tends to infinity), also known as free deconvolution. This 
contribution focuses on the finite Gaussian case and proposes 
algorithmic methods to compute the moments. Cases where 
asymptotic results fail to apply are also discussed. 

Index Terms — Gaussian matrices, Random Matrices, convolu- 
tion, limiting eigenvalue distribution. 



I. Introduction 

Random matrix and free probability theory have fruitful 
applications in many fields of research, such as digital commu- 
nication U, mathematical finance [2| and nuclear physics Q. 
In particular, the free probability framework (4), (3], Q, 
(8) can be used for high dimensional statistical inference (or 
free deconvolution), i.e., to retrieve the eigenvalue distributions 
of involved functionals of random matrices. The general idea 
of deconvolution is related to the following problem J9j : 

Given A, B two n X n independent square Hermitian (or 
symmetric) random matrices: 

1) Can one derive the eigenvalue distribution of A from the 
ones of A + B and B? If feasible in the large n-limit, this 
operation is named additive free deconvolution, 

2) Can one derive the eigenvalue distribution of A from the 
ones of AB and B? If feasible in the large n-limit, this 
operation is named multiplicative free deconvolution. 

In the literature, deconvolution for the large n-limit has 
been studied, and the methods generally used to compute it 
are the method of moments [4], and the Stieltjes transform 
method ifTOl . The expressions turn out to be quite simple if 
some kind of asymptotic freeness [8 | of the matrices involved 
is assumed. However, freeness usually does not hold for finite 
matrices. Quite remarkably, the method of moments can still 
be used to propose an algorithmic method to compute these 
operations. The goal of this contribution is exactly to propose 
a general finite dimensional statistical inference framework 
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based on the method of moments, which is implemented in 
software. As the calculations are quite tedious, and for sake 
of clarity, we focus in this contribution on Gaussian matricefl 
The method of moments (9) is based on the relations 
between the moments of the different matrices involved. It 
provides a series expansion of the eigenvalue distribution of 
the involved matrices. For a given n x n random matrix A, 
the p-th moment of A is defined as 



i n ^ p = E [tr(A p )] = f\ p dp n (X) 



(1) 



where E is the expectation, tr the normalized trace, and dp n 
the associated empirical mean measure defined by dp n (X) = 
^ (n ^(^ — ^*))» wnere are the eigenvalues of A. 

Quite remarkably, when n — > oo, converges in many cases 
almost surely to an analytical expression t p A that depends only 
on some specific parameters of A (such as the distribution 
of its entriesjl This enables to reduce the dimensionality of 
the problem and simplifies the computation of convolution of 
measures. In recent works deconvolution has been analyzed 
when n — > oo for some particular matrices A and B, such as 
when A and B are free [ 131, or A random Vandermonde and 
B diagonal ifTTIl. lfl2l. 

The inference framework described in this contribution is 
based on the method of moments in the finite case: it takes a 
set of moments as input, and produces a set of moments as 
output, with the dimensions of the matrices considered finite. 
The framework is flexible enough to allow for repeated combi- 
nations of the random matrices we consider, and the patterns 
in such combinations are reflected nicely in the algorithms. 
The framework also lends itself naturally to combinations with 
other types of random matrices, for which support has already 
been implemented in the framework JT2|. This flexibility, 
exploited with the method of moments, is somewhat in contrast 
to methods such as the Stieltjes transform method iflOl . where 
combining patterns of matrices naturally leads to more com- 
plex equations for the Stieltjes transforms (when possible) and 
can only be performed in the large n-limit. While the simplest 
patterns we consider are sums and products, we also consider 
products of many independent matrices. The algorithms are 
based on iterations through partitions and permutations as 
in fl4l . where the case of a Wishart matrix was considered. 
Our methods build heavily on the simple form which the 
moments of complex Gaussian random variables have, as 
exploited in lfl4l . We remark that, in certain cases, it is possible 
to implement the method of moments in a different way 

C ases s uch as Vandermonde matrices can also be implemented in the same 
vein H71 . fl2l . The general case is, however, more difficult. 

2 Note that in the following, when speaking of moments of matrices, we 
refer to the moments of the associated measure. 
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also [151, IfTSl . However, we are not aware of any attempts to 
make an inference framework as general as the one presented 
here. The case presented in |[T6l , for instance, handles only 
certain zero-mean, one-sided correlated Wishart matrices. 

The paper is organized as follows. Section [II] provides 
background essentials on random matrix theory and combi- 
natorics needed to state the main results. Parts of Section [II] 
is rather technical, but it is not necessary to understand all 
details therein to understand the statement of the main results. 
These are summarized in Section [III] First, algorithms for 
the simplest patterns (sums and products of random matrices) 
in the finite dimensional statistical inference framework are 
presented. Then, recursive algorithms for products of many 
Wishart matrices and a deterministic matrix are included, as 
well with some general remarks on how the general situa- 
tion can be attacked from these basic algorithms. We then 
explain how algorithms for deconvolution can be obtained 
within the same framework, and formalize the corresponding 
moment estimators. Section[lV]presents details on the software 
implementation of the finite dimensional statistical inference 
framework. Section [V] presents some simulations and useful 
applications showing the implications of the presented results 
in various applied fields. 

II. Random matrix Background Essentials 

In the following, upper boldface symbols will be used for 
matrices, whereas lower symbols will represent scalar values. 
(.) T will denote the transpose operator, (.)* conjugation, and 
(.) H = ((-) T ) hermitian transpose. I n will represent the n x n 
identity matrix. We let Tr be the (non-normalized) trace for 
square matrices, defined by, 



Tr(A) 



i=l 



where an are the diagonal elements of the n x n matrix A. 
We also let tr be the normalized trace, defined by tr(A) = 
iTr(A). When A is non-random, there is of course no need 
to take the expectation in ((T). D will in general be used to 
denote such non-random matrices, and if Di , . . . , D r are such 
matrices, we will write 

11 i tr(D 4 D, ;•. (2) 

whenever 1 < ii,...,i s < r. (O are also called mixed 
moments. 

To state the results of this paper, random matrix concepts 
will be combined with concepts from partition theory. 7(n) 
will denote the partitions of {1, .. .,n}. For a partition p = 
{Wi, W r } £ y{n), Wx,...,W r denote its blocks, while 
\p\ = r denotes the number of blocks. We will write k ^ p I 
when k and I belong to the same block of p. Partition notation 
is adapted to mixed moments in the following way: 

Definition 1: For p = {Wi,...,Wfc}, with Wi — 
{wn, . . . , %|w<|}> we define 



D Wi = D, 



i=l 



(3) 



(4) 




(a) 2, 3 identified, (b) 2, 1 identified, 
4, 1 also 4, 3 also 



\ 



(c) Graph result- (d) Graph result- 
ing from (a). ing from (b). 

Fig. I . Diagrams demonstrating how the second moment of a Wishart matrix 
can be computed. In all possible ways, even-labeled edges are identified with 
odd-labeled edges. There are two possibilities, shown in (a) and (b). The 
resulting graphs after identifications are shown in (c) and (d). The second 
moment is constructed by summing contributions from all such possible 
identifications (here there are only 2). The contribution for any identification 
depends only on n, N, and the number of even-labeled and odd-labeled 
vertices in the resulting graphs (c) and (d). We will write down these 
contributions later. The labels D and E are included since we later on will 
generalize to compute the moments of doubly correlated Wishart matrices, 
where the correlation matrices are denoted D and E. 



With the empirical eigenvalue distribution of a hermitian 
random matrix A, we mean the (random) function 

#{*|A* < A} 



*a(A) 



(5) 



where are the (random) eigenvalues of A. In many cases, 
the moments determine the distribution of the eigenvalues [ 1 8 1 . 
Due to the expectation in (Q3, the results in this paper thus 
apply to the mean eigenvalue distribution of certain random 
matrices. 

In the following, we will denote a standard complex Gaus- 
sian matrix by X. Standard complex means that the matrix 
has i.i.d. complex Gaussian entries, with zero mean and unit 
variance (in particular, the real and imaginary parts of the 
entries are independent, each with mean 0, variance i). X 
will sometimes also be used to denote a standard selfadjoint 
Gaussian matrix, standard selfadjoint meaning that it has i.i.d. 
entries only above or on the main diagonal, with the real and 
imaginary parts independent with variance ^ JSJ. The matrix 
sizes in the following will be denoted n x N for rectangular 
matrices, n x n for square matrices. All random matrices 
we consider will be using selfadjoint or complex Gaussian 
matrices as building blocks. 

A. The diagrammatic method 

Some schools of science learn methods for computing 
the moments of Gaussian matrices from diagrams. As an 
example of what we mean by this, we have in Figure Q] 
demonstrated how the second moment of a Wishart matrix 
^XX fl can be found in this way. In Figure [2] we have 
similarly demonstrated how the second moment of a matrix 
on the form (D + X)(E + X.) H can be found, where D and 
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2 1 




2 1 




4 3 34 3 34 3 34 3 3 

(a) 4, 1 identified (b) 4, 3 identified (c) 2, 1 identified (d) 2, 3 identified 



\ 



D 



E" 



E H 



E H 



E H 



D 



D 



\ 



(e) Graph result- (f) Graph result- (g) Graph result- (h) Graph result- 
ing from (a). ing from (b). ing from (c). ing from (d). 

Fig. 2. Diagrams demonstrating how the second moment of (D + X)(E + 
~X.) H can be computed. As in Figure[T] even-labeled and odd-labeled edges are 
identified in all possible ways, but this time we also perform identifications of 
subsets of edges (edges not being identified correspond to choices from D and 
'Ei H ). In addition to the identifications in FigurefT] we thus also have the ones 
in (a)-(d), where only half of the edges are identified. We also have the case 
where there are no identifications at all. The second moment is constructed 
by summing contributions from all such possible "partial" identifications, and 
the contribution for any identification is computed similarly as with Figure [T] 




2 1 



2 4 



3 4 




(a) 2,4 identified (b) 2,4 identified, 
1,3 also 




(c) Graph (d) Graph result- 
resulting ing from (b). 
from (a). 

Fig. 3. Diagrams demonstrating how the second order moment of R + X 
can be found, with X a selfadjoint, Gaussian matrix. As in Figures [T] and [2] 
all possible identifications of edges are considered, but irrespective of whether 
they are even-odd pairings. In particular, all identifications from these figures 
are considered, with the arrows allowed to go any way. In addition, we also 
get identifications like in (a) and (b), where odd-labeled edges are identified, 
or even-labeled edges are identified. 



E are independent from X. This matrix form is much used 
when combining many observations of a random vector. While 
these two figures assume a complex Gaussian matrix, Figure [3] 
explains how the diagrammatic method can be modified to 
compute the second moment of R+X, where X is selfadjoint, 
Gaussian, and R is independent from it. The diagrammatic 
method is easily generalized to higher moments, and to other 
random matrix models where Gaussian matrices are building 
blocks. 

The simple ingredient behind the diagrammatic method 



is the fact that one only needs consider conjugate pairings 
of complex Gaussian elements [14], which simplifies the 
computation of moments to simple identification of edges in 
graphs in all possible ways, as illustrated. This simple fact 
will be formalized in the following combinatorial definitions, 
which will be needed for the main results. The stated formulas 
are not new, since it has been known for quite some time that 
the diagrammatic method can be used to obtain them. The 
value in this paper therefore does not lie in these formulas, 
but rather in making the general results possible to write down 
within a framework, and available for computation in terms of 
an accompanying software implementation. 

Without going in all the details, there are similarities with 
the sketched diagrammatic approach, and other approaches 
based on diagrammatics. In particular in physics, and espe- 
cially the field of statistical mechanics (see e.g. [1191 . |20|). 
It has been used recently in the field of wireless communica- 
tions, related to the analysis of the mean and the variance 
of the Signal to Noise Ratio at the output of the MMSE 
receiver in MIMO and OFDM-CDMA systems fl2T|. Instead 
of calculating all the moments individually, one can represent 
these operations diagrammatically by solid lines and dashed 
lines. The idea is to draw them using Feynman rules derived 
from a generating function, and perform a resummation of all 
relevant graphs where averaging over matrices corresponds to 
connecting in all possible ways the different lines seperately. In 
many cases, in the large TV-limit, only terms with non-crossing 
lines survive, A general description is proposed in [22|, [23 1, 
[24|. The nomenclature we use for stating our results deviate 
some from that found in the literature. 

To explain better how the diagrammatic method is con- 
nected to random matrices, write the trace of a product of 
matrices as 



E [tr(AiA 2 ■ 
1 



■A p ) 



n 



a^(i 1 ,i 2 )a^(i 2 ,i 3 )---a^(i p7 i 1 ), (6) 

11 ,i 2 ,..., 

where the entries of A& are a^ k '(i,j). We will visualize 
the matrix indices ii,,..i p as points (in the following also 
called vertices) l,...,p on a circle, and the matrix entries 
oS 1 '(ii,i2), •••> a^(ip, ii) as edges labeled l,...,p, with the 
points k, k + 1 being the end points of the edge labeled k. 
We will call this the circular representation of ©. If X is 
nxN standard, complex, Gaussian, the circular representation 
of E (tr((XX ff ) 4 )), before any Gaussian pairings have taken 
place, is thus shown in Figure |4] 

More general than ©, we can have a product of k traces, 

E [tr (Ai • • ■ A P1 ) • • ■ tr (A pl+ ... +Pk _ 1+1 ■ ■ ■ A pl+ ... +Pk )] . 

(7) 

This will also be given an interpretation in terms of k circles, 
with pi,...,pk points/edges on each, respectively. Conjugate 
pairings of complex Gaussian elements in (Q in all possible 
ways are performed as in the case of one circle, and is 
illustrated in Figure for fc = 2 and pi = p 2 = 3, with 
the first edge on the first cirle paired with the last edge on 
the second circle. In ©, this corresponds to a}- 1 ' (11,12) and 



,.(12; 



(?i2,«7) being conjugate of each other (there are twelve 
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Fig. 4. Ordering of points and edges on a circle for the trace 
E (tr((XX^) 4 )) for X complex, standard, Gaussian, when it is written out 
as in (6). The odd edges 1, 3, 5, 7 correspond to terms of the form X; the 
even edges 2,4,6,8 correspond to terms of the form X H ; the odd vertices 
1, 3, 5, 7 correspond to a choice among 1, . . . , n (i.e. among row indices in 
X) ; the even vertices 2, 4, 6, 8 correspond to a choice among 1, . . . , TV (i.e. 
among column indices in X). The bars, used to differ between edges and 
vertices, are only used in the figures. 

matrices present here,since = X^X^). This can only be 
the case if i\ — ij and i 2 — iiz. 




Fig. 5. Identification of edges across two circles. Such identifications arise 
in the computations of {7}- 



B. Formalizing the diagrammatic method 

The following definition, which is a generalization 
from [14], formalizes identifications of edges, as we have 
illustrated: 

Definition 2: Let p be a positive integer. By a partial 
permutation we mean a one-to-one mapping tt between two 
subsets pi,p 2 of {1, ...,p}. We denote by SP P the set of 
partial permutations of p elements. When tt E SP p , we define 
tt € SP 2p by 

n(2j-l) = 2^ 1 (j), jep 2 
tt(2j) = 2tt(j) - 1, j G Pl . 

Note that in this definition, subtraction is performed in such 
a way that the result stays within the same circle. In terms of 
Figure [5] this means that 1 — 1 = 6,6—1 = 5 and 7 — 1 = 
12,12 — 1 = 11. Addition is assumed to be performed in 
the same way, so that 1 + 1 = 2,6 + 1 = 1, and 7 + 1 = 
8, 12 + 1 = 7. In the following, this convention for addition 
and subtraction will be used, and the number of edges on the 
circles will be implicitly assumed, and only mentioned when 
strictly needed. 

When we compute tr(((D + X)(E + X) H ) P ), we multiply 
out to obtain a sum of terms of length 2p on the form 
(|6]l, where the terms are one of X,X H ,D, or ~E H , with •- 
and —terms appearing in alternating order. p\ corresponds 



to the indices of X in such a term (after their order of 
appearance), p 2 to the indices of X, and tt to the Gaussian 
conjugate pairings. Computing tr(((D + X)(E + X)- ff ) p ) thus 
boils down to iterating through SP p . In Figure [2] this was 
examplified with p — 2, with the sizes of the subsets equal 
to 1. pi was indicated by the starting edges of the arrows, p 2 
by the ending edges, (a) to (d) represents the only possible 
pairings in the terms XE ff DX H , DE H XX ff , XX H DE H , 
and DX^XE^, respectively. The case for a Wishart matrix 
is simpler, since there is no need to multiply out terms, and 
we need only consider tt e SP p where = |pa| = p, as 
shown in Figure [T] Such tt are in one-to-one correspondence 
with S p , the set of permutations of p elements. 

It is clear that tt maps (2pi) onto U(2p 2 — 1), and has period 
two (i.e. tt(tt(p)) — p for all p), where 2p 2 — l = {2fc— l\k £ 
p 2 }. In particular, tt maps even numbers to odd numbers, and 
vice versa. When edges are identified as dictated by a partial 
permutation, vertices are identified as dictated by the partition 
p(ir) defined as follows: 

Definition 3: Let tt be a partial permutation, and let tt be 
determined by p\ , p 2 and a pairing between them. We associate 
to tt an equivalence relation p = p(tt) on {1, 2p] generated 
by 

J ~p *(?') + 1. J + 1 ~p Hj)> for j £ p\. (8) 

We let k(p) and l(p) denote the number of blocks of p 
consisting of only even or odd numbers, respectively. 

Any block in p consists either of even numbers, or odd 
numbers, since tt maps between even and odd numbers, so 
that the definitions of k(p) and l(p) above make sense. In 
the following, we will let k\, . . . , kf.( p ) be the cardinalities of 
the blocks consisting of even numbers only, and l\, . . . , Z;( p ) 
the cardinalities of the blocks consisting of odd numbers only. 
The restriction of p to the odd numbers thus defines another 
partition, which we will denote p|odd. Similarly, the restriction 
of p to the even numbers yields another partition, which we 
will denote p|even. ,o|odd and p|even will appear in the main 
results later on. 

p should be interpreted as an equivalence relation on matrix 
indices occuring in X, X H . The following definition similarly 
keeps track of how conjugate pairings group matrix indices 
occuring in D, E H into traces: 

Definition 4: Let T> C {1, .., 2p} be the set of deterministic 
edges (i.e. edges corresponding to ocurrences of D, E H ), and 
let tt € SP p be determined by pi,p 2 - & — alrr) is defined as 
the equivalence relation on D generated by the relations 

k ~v k + 1 if k, k + 1 € D (9) 
k~ a l if k,le'D,k + l~pl. (10) 

Let also kd(p) be the number of blocks of p contained within 
the even numbers which intersect CDU(D + 1), and let ld(p) be 
the number of blocks of p contained within the odd numbers 
which intersect D U (D + 1). 

Two edges from D belong to the same block of a if, after 
identifying edges, they are connected with a path of edges 
from D. A block of p which contains a vertex from D U 
(D + 1) corresponds to a matrix index which occurs in a 
deterministic element. As an example, in Figure [2] all four 
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partial permutations are seen to give rise to a a with one 
block only. They are seen to be a — {2, 3} for (a), a = {1, 2} 
for (b), a = {3, 4} for (c), and a = {1, 4} for (d). 



the singular law of the matrices. In the third section we state 
similar results for the case where the Gaussian matrices instead 
are assumed square and selfadjoint. 



C. Formalizing the diagrammatic method for selfadjoint ma- 
trices 

A standard, selfadjoint, Gaussian n x n random matrix X 
can be written on the form X = -^=(Y + Y H ), where Y is an 
nxn standard complex Gaussian matrix. We can thus compute 
the moments of RX and R + X (with X selfadjoint Gaussian, 
and R selfadjoint and independent from X) by substituting 
this, and summing over all possible combinations of Y and 
Y H . This rewriting in terms of complex Gaussian matrices 
means that we need to slightly change the definitions of the 
partitions p and a to the following: 

Definition 5: Let it £ SP p be determined by disjoint subsets 
P!,p 2 of {1, . . . ,p\ with |pi| = \p 2 \ (in particular, 2|pi| < p). 
We associate to ir an equivalence relation p sa = Paa(^) on 
{!,..., 2p} generated by 



1 for i € pi , 
i for i E P2- 

As before, p\ corresponds to choices from X H , pi to 
choices from X, when the selfadjoint Gaussian matrix is 
expressed as a sum of complex Gaussian matrices. Definition!!] 
is modified as follows: 

Definition 6: With n,pi,p2 as in Definition [5] a sa — 
c S a(7r) is defined as the equivalence relation on D = 
(pi U P2Y generated by the relations 

k~ asa k + l if k,k + l£D (11) 
fc~ CTaa Z if k,l€ D,fc + 1- Psa / 

or k ~ Pea I + 1. (12) 

Define also d(p sa ) as the number of blocks of a Psa which 
intersect BU(D+ 1). 

As an example, In Figure |5Jc) we have that p sa — 
{{1,2}, {3, 4}}, o sa = {{1},{3}}, in Figure Sd) we have 
that p sa = {{1,2,3,4}}, a sa is the empty partition. 

In the following, we will state our results in terms of 
normalized traces. We remark that some of these have been 
stated previously in terms of non-normalized traces 04). In 
some results, we have substituted c = j^, which makes the 
results compatible with the asymptotic case often used in the 
literature, where n and N grow to infinity at the same rate, 
the rate being c = lim„^oo -S*. In the following, equivalence 
relations will interchangeably also be refered to as partitions. 

III. Statement of main results 

The main results of the paper are split into three sections. 
In the first, basic sums and products are considered, basic 
meaning that there is only one random matrix involved. In 
the second section we expand to the case where independent 
random matrices are involved, in which case expectations 
of products of traces are brought into the picture. In these 
two sections, all Gaussian matrices are assumed complex and 
rectangular, for which the results relate to the moments of 



A. Basic sums and products 

Our first and simplest result concerns the moments of a 
doubly correlated Wishart matrix. These matrices are the 
most general known form we have found which have been 
considered in the literature 1251 . which can be addressed by 
our results: 

Theorem 1: Let n, N be positive integers, X be n x N 
standard, complex, Gaussian, and D a (deterministic) nxn 
matrix, E a (deterministic) N x N matrix. For any positive 
integer p, 



E 



tr 



— DXEX H 

N 



E 

7reS D 



jY fc (p)- 



P n i( P )- 



1 Tp 
^ p\odd -^pleven • 



Theorem[T]is proved in Appendix[A] The next results will be 
proved using the same techniques, and will therefore be given 
shorter proofs. The special case of a product of a Wishart 
matrix and a deterministic matrix, and a Wishart matrix itself, 
can be considered as a special case. It is seen that, for the 
latter, the contribution for the identification of edges refered to 
in Figure[T]is equal to N k ( p ' > ~ p n l ( p ' > ~ 1 , which indeed depends 
only on n, N, and the number of even-labeled and odd-labeled 
vertices in the resulting graphs. As an example, the contribu- 
tions from the two possible identifications of edges giving the 
second moment of a Wishart matrix is iV 1_2 n 2_1 = % = c 
(Figure da)) and N^n 1 - 1 = 1 (Figure \Bb)). The second 
moment is thus 1 + c, which also can be infered from the more 
general formuals in Section IIVI 

We remark also that other closed forms of (13[ can be 
found in the literature. When the Wishart matrices are one- 
sided correlated (i.e. E — I), [16| gives us the means to find 
the first order moments (i.e. one circle only is involved) in 
certain cases, also if the p'th moment is replaced with more 
general functionals of X. It seems, however, that this result 
and the techniques used to prove it are hard to generalize. 

We now turn to the moments of (D + X)(E + X) H . In 
the large n, A^-limit, the case where D = E is related to the 
concept of rectangular free convolution |26l . which admits a 
nice implementation in terms of moments |27|. When n and 
N are finite, the following will be proved in Appendix 151 

Theorem 2: Let X be an nx N standard, complex, Gaussian 
matrix, D,E deterministic n x N matrices, and set D p = 
tr ((|DE iJ ) p ). We have that 



tr 



1(D + X)(E + X) ff y) 



E 



TresPp 



iN\pi\ 



jyk(p(ir))-kd{p(-K)) 



A(p(n))-ld{p(ir)) 



xn 



kW.I/2- 



(13) 
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Note that in Theorem [2] n- and iV-terms have not been 
grouped together. This has been done to make clear in the 
proof the origin of the different terms. 

B. Expectations of products of traces 

Theorems Q] and [2] can be recursively applied, once one 
replaces D and E with random matrices. In this process, we 
will see that expectations of products of traces are also needed, 
not only the first order moments as in theorems Q] and [2] The 
recursive version of Theorem Q] looks as follows. 

Theorem 3: Assume that the n X n random matrix R and 
the N x N random matrix S are both independent from the 
n x N standard, complex, Gaussian matrix X, and define 



M, 



= E [tr (R/ 1 ) tr (R* 2 ) 
xtr(S mi )tr(S m2 )--- 
1 



tr 



Pi,— )P* 



= E 



tr 



N 



tr(S 
RXSX H 



(R z - 



x tr 



xtr 



1 



RXSX 



H 



— RXSX ff 

N 



T>2 



Pk 



Set p = pi + ■ ■ ■ + pk, and let as before lx, . . . , l r be 
the cardinalities of the blocks of odd numbers only of p, 
mi, . . . , m s be the cardinalities of the blocks of even numbers 
only of p, with k(p), l(p) the number of blocks consisting of 
even and odd numbers only, respectively. We have that 



■,Pk 

E 



li , . . . ,l r .mi , . . . ,m s 



(14) 



Proof: There are only two differences from Theorem Q] 
First, n^ 1 is replaced with rT k , since we now are taking k 
traces instead of 1 (we modify with additional trace normal- 
ization factors). Second, we replace the trace of a deterministic 
matrix with the expectation of a random matrix. It is clear that 
the only additional thing needed for the proof to be replicated 
is that the random matrices X and R are independent. ■ 

In some cases, for instance when S = /, Theorem [3] 
also allows for deconvolution. By this we mean that we can 
write down unbiased estimators for R Plj ..., Pk , (which is the 
simplified notation for the mixed moments for the case where 
S = I) from an observation Y of -^RXX fl . This is achieved 
by stating all possible equations in ( TBI (i.e. for all possible 
Px, . . . ,Pk), and noting that these express a linear relationship 

between all {/»',, !> . and all {M pi ^ ph } pu ,„ tPk , 

where there are as many equations are unknowns. The im- 
plementation presented in Section |IV] thus performs deconvo- 
lution by constructing the matrix corresponding to (TBI) , and 
applying the inverse of this to the aggregate vector of all mixed 
moments of the observation. We remark that the inverse may 
not exist if N < n, as will also be seen from expressions for 
these matrices in Section HV1 

It is also clear that the theorem can be recursively applied to 
compute the moments of any product of independent Wishart 



matrices 

D ^ XlXf i X2X -"^ XfcX " (15) 

where D is deterministic and Xj is an n x Ni standard complex 
Gaussian matrix. The R's during these recursions will simply 
be 

1 „ „„ 



Ri 


= ^ 


XxXf 


1 


X 2 X^ 


R 2 


Nx 


XxXf 


1 


X 2 X^ 



1 



X fe _ 2 Xf_ 2 



Rfc-D. 

Unbiased estimators for the moments of D from observations 
of the form ( fl3b can also be written down. Such deconvolution 
is a multistage process, where each stage corresponds to 
multiplication with an inverse matrix, as in the case where 
only one Wishart matrix is involved. 

The recursive version of Theorem |2] looks as follows. 

Theorem 4: Let X be an nx N standard, complex, Gaussian 
matrix and let R, S be n x N and independent from X. Set 



R 



Vl,—,Pk 



tr 



x tr 



xtr 



1 

N' 



RS 



H 



N 

-RS H 

N 



P2 



M„ 



E 



tr 



> 



x tr 



xtr 



(R 

(R- 



We have that 



M, 



PX, — ,Pk 



E 



T = 7i-(p 1 ,p 2 ,q) 



•X)(S + 

+ x)(s^ 

f X)(S + 

i 



X 



k(p(7v))-kd(p(7v)) 



X N' 

x n l(p(7T))-ld(p(7T)) n \a\ 



X R, 



l u ...,, 



(16) 



where lx, ■ . ■ , l r are the cardinalities of the blocks of a, divided 
by 2. 

The proof is omitted, since it follows in the same way The- 
orem Q] was generalized to Theorem [3] above. Deconvolution 
is also possible here. It is in fact simpler than for Theorem [3] 
in that there is no need to form the inverse of a matrix IfTTll . 
This is explained further in the implementation presented in 
Section |IV] 

C. Selfadjoint Gaussian matrices 

The analogues of Theorem[T]and Theorem[2]when the Gaus- 
sian matrices instead are selfadjoint look as follows. Since 
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Theorem Q] and Theorem [2] had straightforward generalizations 
to the case where all matrices are random, we will here assume 
from the start that all matrices are random: 

Theorem 5: Assume that the n x n random matrix R is 
independent from the n x n standard selfadjoint Gaussian 
matrix X, and define 

Rp u ..., Ph = E [tr (R^ 1 ) tr (R P2 ) • • • tr (RP»)] 



M Pu ..., Pk =E 



tr 



x tr 



xtr 



R^X 

V» , 

R^X 

\/n 



R-i=X 



Set p = pi ^ and let Zi, . . . , l r be the cardinalities of 

the blocks of p sa . We have that 

2 -p/2 n r-p/2-k Rh ^ l ^ (n) 



M, 



Px,—,Pk 



E 



Pll = IP2l=p/2 
P1:P2 disjoint 

Proof: The proof follows in the same way as the proofs 
in Appendix [A] and [B] We therefore only give the following 
quick description on how the terms in ST% can be identified: 

• 2~ p / 2 comes from the p normalizing factors in 

• n r comes from replacing the non-normalized traces with 
the normalized traces to obtain Rt lt ... t i r , 

• n~ p / 2 comes from the p normalizing factors A= in 
R^=X, 



n 



~ k comes from the k traces taken in M, 



Proof: The items in (fT8l > are identified as follows: 
. 2~l pi l comes from the normalizing factors in the 2\p\\ 
choices of -±(Y + Y H ), 

comes from the normalizing factors -h= in the 



,-|pi| 



2\pi\ choices of r^X, 

n |p(w)a |-d(p(7r) ao ) comes f rom counting the vertices 
which do not come from applications of (fT2b . 



n k comes from the k traces taken in M, 



in- 



•iPfc' 



• tt.! " 3 " I comes from replacing the non-normalized traces 
with the normalized traces to obtain i2j I j p . 

■ 

Recursive application of theorems [3] lU E] and [6] allows 
us to compute moments of most combinations of independent 
(selfadjoint or complex) Gaussian random matrices and deter- 
ministic matrices, in any order, and allows for deconvolution 
in the way explained. This type of flexibility makes the method 
of moments somewhat different from that of the Stieltjes trans- 
form, where expressions grow more complex, when the model 
grows more complex. Moreover, contrary to methods based 
on the Stieltjes transform, the results scale in terms of the 
number of moments: from a given number of moments, they 
enable us to compute the same number of output moments. 
The theorems also enable us to compute second order moments 
(i.e., covariances of traces) for many types of matrices, using 
the same type of results. Asymptotic properties of such second 
order moments have previously been studied 11281 . [29|, |30|. 
While previous papers allow us to compute such moments and 
second order moments asymptotically, in many cases the exact 
result is needed. 



Pi,— iP** 



Similarly, the result for sums involving selfadjoint matrices 
takes the following form: 

Theorem 6: Let X be an n x n standard selfadjoint Gaussian 
matrix, and let R be n x n and independent from X. Set 



Rpi Pk ~ ^ 



tr 



1 



R 



x tr 



xtr 



R 



P-2 



E 



tr 



^(R + x) 

Jn 



pi 



x tr 



xtr 



(r + x: 



(r + x; 



Pk 



Set p = pi + ■ ■ ■ +Pk, and let l\, . . . , l r be the cardinalities of 
the blocks of a sa from Definition [6] We have that 



M, 



Px,—,Ph 



E 



|Pll = |P2l<P/2 
PI ,P2 disjoint 

x n 



-<*(p(w)«)-fc+Kal 



X R h,-,lr 



(18) 



IV. Software implementation 

Theorems [3] |U [5] and [6] present rather complex for- 
mulas. However, it is also clear that they are imple- 
mentable: all that is required is traversing subsets (pi,p2), 
permutations (tt, q), and implement the equivalence relations 
p(tt), <t(7t), p(7r) sa , a(w) sa from tt. Code in Matlab for doing 
so has been implemented for this paper PP . as well as the 
equivalence relations we have defined. Also, the implementa- 
tion stores results from traversing all partitions in matrices, 
and this traversal is performed only once. Our formulas 
are thus implemented by multiplying the vectors of mixed 
moments with a precomputed matrix. These operations are also 
vectorized, so that they can be applied to many observations 
simultaneously (each vector of mixed moments is stored as 
a column in a matrix, and one larger matrix multiplication is 
performed). Representing the operations through matrices also 
addresses more complex models, since many steps of matrix 
multiplication are easily combined. In 11321 . documentation 
of all public functions in this library can be found, as well 
as how our methods for Gaussian matrices can be combined 
with other types of matrices. The software can also generate 
formulas directly in ETgX, in addition to performing the 
convolution or deconvolution numerically in terms of a set of 
input moments. All formulas in this section have in fact been 
automatically generated by this implementation. For products, 
we have written down the matrices needed for convolution 
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and deconvolution, as described previously. For sums, we have 
only generated the expressions for the first moments. Due to 
the complexity of the expressions, it is not recommended to 
compute these by hand. 



A. Automatically generated formulas for theorems\3\and\2\ 

We obtain the following expression for the first three 

moments in Theorem [3] where R Pl Pk (we consider only 

one-sided correlated Wishart matrices) and M Plj ... tPk are as in 
that theorem: 





More generally, in order to compute the moments of products 
of Wishart matrices, we need to compute matrices as above 
for the different sizes of the different Wishart matrices, and 
multiply these. In Section [V] we will see an example where 
two Gaussian matrices are multiplied. Note that the matrices 
from above are not invertible when N = 1. 
Defining 



D p = tr 



M p = E 



i \ p 

-BT> H 

N 



tr 



1(D + X)(D + X)« 



in accordance with Theorem [2] the implementation generates 
the following formulas: 



Mi 

M 2 
M 3 



M 4 



Di + 1 

D 2 + (2 + 2c)D 1 + {l + c) 
D 3 + (3 + 3c) D 2 

( 3 

l2 i / q i n„ i Q„2 _i 

TV 2 

1 

TV 2 

D± + (4 + 4c) D 3 + 8cD 2 D 1 
2 16 
" + iV 2 
(14c + 14c 2 ) D\ 



+3cDf+ ( •'■! + <ic i-3r 

l + 3c 

f (4 + 

6 + 16c + 6c 2 



D 1 



Do 



4 + 24c + 24c 2 + 4c d 



1 + 6c + 6c 2 + c 3 



20 + 20c 



N 2 



D] 



5c 



N 2 



B. Automatically generated formulas for theorems\5\and\6\ 

We obtain the following expression for the first four mo- 
ments in Theorem |5] where Rp lt .. Pk and M PU „, iPk are as in 




l 




1 



i?2 

R 



that theorem: 

'M 2 \ ( 1 

M M ) \ 4, 

M 4 
M 2>2 
M 3)1 
M 2) i,i 

V M M , ia y \ o ^ 

(since Mi = A/3 = 71/2,1 = -^1,1,1 = 0). The implementation 
is also able to generate the expected moments of the product 
of any number deterministic matrices, independent, selfadjoint 
(or complex) Gaussian matrices, in any order l32l . This is 
achieved by constructing the matrices for the selfadjoint and 
complex cases as above, and multiplying the corresponding 
matrices together in the right order. 
Defining 



1,1 J 
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\ 


/ 







1 




R2.2 










R3,l 


j_ 

a- 







-R 2 ,i,i 
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Ri, 1,1,1 











Dp 


= tr 


(( 










M p 


= E 


tr 











(D + X) 

in accordance with Theorem |6j the implementation generates 
the following formulas: 



Mi 
M 2 
M 3 
M 4 



D 2 + l 
D 3 + 3D 1 
D 4 + AD 2 



2D\ 



V. Applications 

In this section, we consider some wireless communications 
examples where the presented inference framework is used. 

A. MIMO rate estimation 

In many MIMO (Multiple Input Multiple Output) antenna 
based sounding and MIMO channel modelling applications, 
one is interested in obtaining an estimator of the rate in a 
noisy and mobile environment. In this setting, one has M noisy 
observations of the channel Y; = D + erN^, where D is an 
71 x N deterministic channel matrix, is an n, x JV standard, 
complex, Gaussian matrix representing the noise, and a is the 
noise variance. The channel D is supposed to stay constant 
during M symbols. The rate estimator is given by 



(19) 



where p — 4y is the SNR, and are the eigenvalues of 
■^DD^. This problem falls within the framework we are 
proposing. The extra parameter a did not appear in any of 
the main theorems. In 13TI . it is explained how this is handled 
by the implementation using our results. 

We would like to infer on the capacity using our moment- 
based framework. We are not able to find an unbiased estima- 
tor for the capacity from the moments due to the logarithm in 
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dT9l >, but we will however explain how we can obtain an unbi- 
ased estimator for the expression fTCLi (1 + pAi) use d m ©■ 
This is simplest when a limitation on the rank, rank(DD^) < 
k, is known 0. On the assumption of such a limitation, we can 
write nlLi C 1 + P^i) = 1 + Er=i P r n r (Ai, • • • , A n ), where 



IIi (Ax, 

n 2 (A 1; 



n n (Ai 



,A„) = Ai + - 
,A„)= E 



+ A„ 



AiAj 



!<2<j<n 



, A n ) — Al • • ■ A r , 



are the elementary symmetric polynomials. With D n the 
moments of -^DD^, and D p as in Definition Q] The Newton- 
Girard formulas [34| (slightly rewritten) say that we can find 
coefficients a n so that 



n fc (Ai,--- ,A„) 



a p D p . 



In Section [TTT] we explained how we can obtain unbiased 
estimators for the D p on the right hand side from the noisy 
observations Y^. We can thus also obtain unbiased estimators 
for the Ilfe(Ai, ■ • • , A„). Due to the rank restriction, only k of 
the A; are nonzero, so that only ITi, II2, 11^ can be nonzero. 
We thus obtain an unbiased estimator for n™=i(l + pAi), 
since this can be written as a linear combination of the IL;. 
In the following, all rate estimations will follow this strategy 
by first computing an unbiased estimate for n™=i(l + pAj)> 
and substituting this in $1% . As with Theorems [3] and [4] such 
an estimator thus scales in terms of the moments: it depends 
on the first k moments of the observations only, once the 
restriction rank(DD^) < k is known. 

The inference methods in Section [Till are formulated for the 
case of one observation only. When we have many observa- 
tions, we have some freedom in how they are combined into 
new estimators: 

1) we can form the average j;^2f = i(D + crN^) of the 
observations, and use that this has the same statistical 
properties as D + with N again standard, com- 
plex, Gaussian, 

2) we can stack the observations into a compund observa- 
tion matrix. In ifTTIl it was shown how such matrices can 
be included in the same inference framework, so that our 
methods also apply to them, 

3) we can take the average of the moments we obtain from 
applying the framework to each observation separately. 

In ifTTl . the variances of the estimators for the moments are 
analyzed, and it is shown that the two first strategies above 
provide lower variance than the third strategy, and that the first 
two strategies have comparable variances. We will therefore 
in the following apply the framework with the first strategy. 
We have tested two cases. First a 2 x 2-matrix 
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(20) 
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Fig. 6. Estimation of the channel capacity using the method of moments for 
the 2 X 2-matrix i20\ for various number of observations, p = 5. 



was used with p = 5 and different number of observations. 
The corresponding simulation is shown in Figure [6] The fact 
that the channel matrix is diagonal is irrelevant for the rate 
estimation. In the second case a 4 x 4-matrix 



D 



/ 1 \_ 

0.5 

2 

\ 1 / 



(21) 



was used, with p = 10 and different number of observations. 
The corresponding simulation is shown in Figure [7] In the 
second case, the number of variables to be estimated is higher 
than in the 2 x 2-matrix case (4 eigenvalues instead of 2). 
In general one should then expect that more symbols are 
needed in order to obtain the same accuracy in the estimation. 
Although the figures partially confirm this, the different matrix 
sizes in the two cases makes the situation somewhat more 
involved (the moments converge faster for matrices of larger 
size). 



B. Understanding the network in a finite time 

In cognitive MIMO Networks, one must learn and control 
the "black box" (wireless channel for example) with multiple 
inputs and multiple outputs (Figure [SJ within a fraction of time 
and with finite energy. The fraction of time constraint is due 
to the fact that the channel (black box) changes over time. 
Of particular interest is the estimation of the rate within the 
window of observation. 

Let y be the output vector, x and n respectively the input 
signal and the noise vector, so that 



3 In 1331 . the rate was also estimated, but without actually using unbiased 
estimators for products of traces, as formulated in Section Mill 



y = x + cm. 



(22) 
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Fig. 7. Estimation of the channel capacity using the moment method for the 
4 X 4-matrix 1211 for various number of observations, p = 10. 




Fig. 8. Cognitive MIMO Networks 



In the Gaussian case, the rate is given by 



C = H(y) - H (y|x) 
= log 2 det(7reRy) 
/ det(Ry 

log 2 



log 2 det(7reRjv) 



v det(R;v) 

where Ry is the covariance of the output signal and Rat is 
the covariance of the noise. Therefore, one can fully describe 
the information transfer in the system knowing only the 
eigenvalues of Ry and R^r. Unfortunately, the receiver has 
only access to a limited number of L observations of y, and 
not the covariance of Ry. However, in the case where x and n 
are Gaussian vectors, y can be written as y = R|,u where u is 
an i.i.d standard Gaussian vector. The problems falls therefore 
in the realm of inference with a correlated Wishart model 



L H 



In the simulation we have taken n as an i.i.d. standard 
Gaussian vector of dimension 2, and 



R 



x 




0.5 2 



(23) 



and have used Theorem [4] to take care of the additive part, 
following up with Theorem [3] to take care of the Gaussian 
part of x. Considering L observations of d22l . we unfortu- 
nately can't use the same procedure as in Section IV-AI (i.e. 



Fig. 9. Estimation of the capacity for the model (22) up to L = 500 
observations, with a = 0.5. 



averaging the observation vectors first), since the matrices 
corresponding to (TT4T > are not invertible for N — 1. Instead 
we have stacked the observations as columns in a compound 
matrix, and applied the framework to this in order to get an 
unbiased estimate of the moments of Rj. In Figure|9] we have 
followed the same procedure as explained in Section IV-AI for 
estimating the capacity from these moments. To demonstrate 
the convergence to the true rate, we have also increased 
the number of observations. In order to also estimate the 
eigenvalues of Rx, we can first get unbiased estimates for the 
elementary symmetric polynomials as in section IV-AI hence 
also for the characteristic equation of Rx, and solve this. 
Similarly to the case for the capacity, is is only the estimate for 
the characteristic equation which is unbiased, not the estimates 
for the eigenvalues themselves. In Figure [10] we have shown 
the estimates for the eigenvalues of Rx obtained in this way. 



C. Power estimation 

Under the assumption of a large number of observations, 
our finite dimensional inference framework was not strictly 
needed in the two previous examples: the observations could 
instead be stacked into a larger matrix, where asymptotic 
results are more applicable. When the asymptotic result can 
be used, inference in terms of the moments becomes simpler, 
due to the almost sure convergence of the empirical eigenvalue 
distributions of the matrices [8|. In the asymptotic regime, 
Theorems Q] [2] and [6] can in fact all be implemented by 
direct application of additive free- and multiplicative free con- 
volution, and the moment-cumulant formula (35), f° r which 
efficient implementations exists lfP3l . without the need for 
iterating through all partitions. Theorem[5]can be implemented 
in terms of the 5-transform [5 |, which has an implementation 
in terms of power series [36|, also without the need for 
iterating through all partitions. 

This section describes a third model, where it is unclear how 
to apply such a stacking strategy, making the finite dimensional 
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Fig. 10. Estimation of the eigenvalues for the 2 X 2-matrix Rjf of i23) for 
various number of observations. 



results more useful. In many multi-user MIMO applications, 
one needs to determine the power with which the users send 
information. We consider the system given by 



WP5 Si + am 



(24) 



where W, P, Sj, and are respectively the N x K channel 
gain matrix, the K x K diagonal power matrix due to the 
different distances from which the users emit, the Kxl matrix 
of signals and the N x 1 matrix representing the noise with 
variance a. In particular, W, Sj,nj are independent standard, 
complex, Gaussian matrices and vectors. We suppose that we 
have M observations of the received signal y^, during which 
the channel gain matrix stays constant. Considering the 2x2- 
matrix 

1 
0.5 



Pi = 



(25) 



applying Theorem [4] first, and then Theorem [3] twice (each 
application takes care of one Gaussian matrix), we can es- 
timate the moments of the matrix P from the moments of 
the matrix YY H , where Y = [yi, • • • , yjw] is the compound 
observation matrix. We assume that we have an increasing 
number of observations (L) of the matrix Y, and take an 
average of the estimated moments (we average across several 
block fading channels). From the estimated moments of P we 
can then estimate its eigenvalues as in Section IV-AI When 
L increases, we get a prediction of the eigenvalues which is 
closer to the true eigenvalues of P. Figure QT| illustrates the 
estimation of eigenvalues up to L = 1200 observations. 

It is possible to compute the variance of the moment 
estimators for the model (l24l . We do not write down expres- 
sions for these, but remark that the framework is capable of 
performing this tedious task. These expressions turn out to 
involve combinations of K, M, and N in the denominators, 
so that in order for the variance to be low, large values for 
K, M, N are required. In Figures [T2] and Qj] we note that the 
variance decreases much faster when we increase K, M, N 
jointly, than when we increase the number of observations. 



Fig. 1 1 . Estimation of the powers for the model )24h where the number L 
of observations is increased, the sizes of the matrices are K = N = M = 2 
and a = 0.1. The actual powers are 0.25 and 1. 
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Fig. 12. Estimation of the powers for the model {24), where the size K = 
TV = M of the matrices is increased, the number of observations is fixed 
L = 15 and a = 0.1. The actual powers are 0.25 and 1, 



VI. Conclusion and further work 

In this paper, we have introduced a framework which 
enables us to compute the moments of many types of combina- 
tions of independent Gaussian- and Wishart random matrices, 
without any assumptions on the matrix dimensions. We also 
explained an accompanying software implementation, and also 
some useful applications where the framework has been used 
for simulations. 

Future work will focus on applying and extending the 
framework to other types of matrix models. It may also 
be possible to extend the framework to obtain not only the 
moments we consider, but also the negative moments ll37l . 

While the formulas presented here have been generated by 
traversing sets of partitions, there may exist expressions for 
the same formulas which are more efficient to compute, as 
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Fig. 13. Estimation of the powers for the model {24), where the size K = 
N = M of the matrices is increased, the number of observations is fixed 
L = 50 and a = 0.1. The actual powers are 0.25 and 1. 



has been found at least in one case 11161 . Future work will 
also attempt to find such simpler expressions. This is a must 
if the method of moments needs to compute moments of order 
much higher than used here. 

Since the method of moments only encodes information 
about the lower order moments, it lacks much information 
which is encoded naturally into Stieltjes transform, so that 
spectrum estimation based on the Stieltjes transform has much 
better performance when few moments are considered. Once 
one can find simpler expressions for higher order moments, 
an interesting project would be to find how many moments 
are typically needed in order for the method of moments to 
perform close to the Stieltjes transform method, methods. 

Appendix A 
The proof of TheoremQ] 

In order to prove Theorem [U we will expand the moments 
tr (DXEX H ) P 



E 



(26) 



following in the footsteps of [ 14|, and in the process generalize 
results therein, since no deterministic part was involved in 
that paper. We will thus in the following rewrite some of 
the important parts in the proofs in fl4l . since these are 
needed in our generalizations. First, we will need the following 
proposition. 

Proposition 1: Let X be n X N standard, complex, Gaus- 
sian, and D a deterministic n x n matrix. Let p be a positive 
integer, then 



E 



tr (DXEX ff ) P 

= J2 E [ tr ( dx i ex ^(d • • • DX P EX ?( P ) 

TTESr, 



where X 1: . . . , X p are independent nx N standard, complex, 
Gaussian matrices. 



Proof: Let (Xj) igN be a sequence of independent nx N 
standard, complex, Gaussian matrices with entries x(u,v,i), 
1 < u < n, 1 < v < N. For any s € N, the matrix 
s -1 / 2 (Xi + • • • + X s ) is again n x N standard, complex, 
Gaussian. Hence, we can write 



E 



= E tr 



tr (DXEX H ) P 

{tr [D (s-WpLi + H-X...!) E 

V2( Xl + ...+X,)' 



i<*i)ji)—>»p>ip<« 



E 



tr DX n EX£---DX lp EX 



3p J 



Denoting by d(i,j) the elements of D, e(i,j) the elements of 
E, we have that 



E 



= n 1 x 



tr(DX n EX£...DX 4p EX£ / 

d(u p ,Vx)---d(up- 1 ,Vp)-; 



i < i. 



n>«2 u p: 

isn.«2 «p<» 

l<i U1 ,ra 2 ,...,rap<N 



e(wi,yi)---e(w p ,y p )x 



E[x(vi,wi,ii) x x(u 2 ,yi,ji) x 



x(vp,Wp,ip)x(ui,y p ,j p )], (27) 

and we need only sum over conjugate pairings of the Gaussian 
variables, i.e. for a 7r € S p we have 



(28) 



for all h. Hence, we only have to sum over those 2-tuples 
(ii,ji,---,i P ,jp) that are in 

M(tt,s) = {(zi, ji, . . . ,ip,j p ) S {l,2,...,s} 2p | 

jl = *7r(l); ••• >jp = V(p)} ■ 

for some ir £ S p , i.e. 



E 



tr (DXEX 

E 



E 



H 



tr DXi, EXf • • • DXi EX, 



We observe that the sets M(tt, s) are not disjoint, but if we 
put 

D( s ) = {( il ,j 1 ,...,z p ,j p )e{l,2,..., s } 2p | 

i\, i2, • ■ • i p are distinct} 
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the sets M(ir,s) f] 2)(s), tt E S p , are disjoint. Thus, we can 
write 



E 



tr (DXEX H ) P 

E E 



tr ( DX„ EXf • • • DX 4 EXf 



ip,j P )eM( w ,s)nD( s ) 



-v 



£ E [tr (DX n EX? • • • DX V EXg 



„y f )j«(.,.)\ii(i) 



(29) 



All . . . ,ip,j p ) € M(7r, s) n D(s) give the same 

contribution in the above sum, so that we can write the first 
term of i 



s— foo 

so that the first term of 



s- p ^2 card(Af(7r,s)nD(s))x 

E[tr(DX 1 EXf (1) ...DX p EX^ (p) 

Since the cardinality of M(tt, s) fl D(s) is equal to s(s — 
1) • • • (s — p + 1), we have 

lim s _p card(M(7r, s) n D(s)) = 1, 

i tends to 

E ^ (DXiEX£ (1) • • • DX p EXf (p) )^ 

as s —> oo. Observing that 

S - p card(M(7r, s) \ D(s)) 

= [s- p card(Af(7r, s)) - s~ p card (M(tt, s) n D(s))] 
= [1 - s" p card (M(tt, s) n D(s))] — > 0, 

as s — > oo, and summing over tt E S p , we see that the second 
term in d29l tends to 0, and (|27T i follows. ■ 
Theorem Q] will follow from Proposition Q] the following 
proposition, and insertion of the additional iV _p -f actor in ( fT3] >: 

Proposition 2: For any positive integers n, N, any tt E S p 
and any D deterministic n x n matrix, we have 



E 



tr (DXxEX" } ...DX p Xf ( 



t(p) 



with p,k(p),l(p) as in Definition [3] 

Proof: Inserting ( l28T > into (|27] > we obtain 



(30) 



E [tr ( (DXEX") P 



n 1 x 



E E d (« W (p)) w l)--- d ( W 7r(p-l),«p)X 

i<yi 1 1/2 ^ ■ ■ ■ i^p < w 

eK-'WifO ' ' ' e ("«V-i(p)' y P ) 

71 1 E E d ( U 7r(p),«l)---rf('W7r(p-l)! U J>) 

I E ^- I (i),y 1 )"'e(y ir -i( I ,] ) s/ p ) 

,l<3/i,2/2,---,y P <JV 



The result will follow from analyzing the terms in this expres- 
sion. 

p restricted to the even numbers is generated by the relations 

2j~2n(j), j£{l,...,p}. 

Mapping even numbers < 2p onto {1, . . . ,p}, this is equiv- 
alent to j ~ 7r(j), j E {1, . . . i.e., the blocks consisting 
of even numbers are in one-to-one correspondence with the 
cycles of tt. From this it follows that 



E 

i<yi,y2,...,y P <N 



e(Vw-Hi),yi) ■ ■•e(y w -i( J> ),y p ) = iV fc(p) E p | e 



(31) 

since the matrix indices follow the cycle structure of tt. Here 
jyfc(p) comes from the fact that the summand is a product of 
k(p) non-normalized traces of N x TV-matrices. 

p restricted to the odd numbers is generated by the relations 

2i-l~2 7 r- 1 (j) + l. 

Mapping odd numbers < 2p onto {1, . . . ,p}, this is equivalent 
to j ~ 7r _1 (j) + 1, j E {1, . . . ,p}. From this it follows that 



E 

l<vi ,V2 ,...,v p <n 



d(v^ p ),vi) ■ ■ ■ <2(v„.0-i)> " P ) = n i(p) D p | 0dd 

(32) 



The result now follows by inserting (1311 1 and (f32t . 



Appendix B 
The proof of Theorem[2] 

Since only conjugate pairings of Gaussian variables con- 
tribute, we need only consider partial permutations. The con- 
tribution from the partial permutation tt = tt(p%, p2,q) can be 
written 



n 1 x 



i<«i,«a 
i<t»i,to a 



e n d ( vi > w ^> n e(v i+ i, wo > 



x{vp(i),w p(1) , p{l))x{v p(1)+1 ,w p(1) , p 2 (g(l))) x 



x ( v pi(\pi\)' W pi(\pi\)iPl(\Pl\)) x ( v pi(\pi\)> w pi(\pi\)^P2(q(\Pl\))) 

Note that if 2k— 1, 2k E T) (i.e. the first relation (0 generating 
a), so that k E p\ n p|, we find d(«fe, Wk)e(vk+i, Wk) as a 
part in the matrix product above, which is a part of the matrix 
product DE H . Similarly, if 2k, 2k + 1 E D, we find a part of 
the matrix product ~E H T). 

On the other hand, if 2k - 1,21 E T> with (2k - 1) + 
1 = 2k ^ p 21 (i.e. the second relation (TToT > generating cr), 
we find that u^. = wi as in Appendix [A] so that we find 
d(vk,Wk)e(vi+i,Wk) as a part in the matrix product, which 
again is a part of the matrix product DE H . We can reason 
similarly when k and I swap roles, to find a part of the matrix 
product E ff D. 

In conclusion, the relations (0 and ( flOb reflect a cyclic 
product of the deterministic elements, the length of the product 
equaling the number of elements in the corresponding block 
of a. Moreover, it is clear that the D and E H appear in 
alternating order in the corresponding matrix product. In 
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particular, all blocks of a have even cardinality. The matrix 
product constitutes a non-normalized trace. Thus, if <ii is the 
i'th block in er, |aj| is even, and the matrix product of the 
deterministic elements is 

IjTr((DE H ) |ff ' l/2 ) = J]^tr((DE ff )l ff -l/ 2 )). (33) 

i i 

(1331 , which is seen to be the last term in (Qj]), thus contributes 
in tr(((D + X)(E + X.) h )p). The other terms in O are 
identified as follows: 

• the first n in the first term .} , „, comes from taking 
the trace, while N' Pl ' comes from the normalizing factor 
for the Gaussian terms (the normalizing factors for the 
deterministic terms were absorbed in their definition). 

. ]\[k(p)-kd(p) correS p 0nc i s to the number of all the choices 
of blocks of p with even numbers only, which do not 
intersect T> U (D + 1), 

• n l ( p ' ld ( p ) corresponds to the number of all the choices 
of blocks of p with odd numbers only, which do not 
intersect D U (D + 1), 
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